

**AFRL-IF-WP-TR-2002-1523**

**POWER ESTIMATION AND  
SYNTHESIS FOR LOW POWER**

**Dr. Kaushik Roy**

**Purdue University  
Department of Electrical and Computer Engineering  
1285 Electrical Engineering Building  
West Lafayette, IN 47907-1285**



**AUGUST 2002**

**Final Report for 22 March 1995 – 30 April 1999**

**Approved for public release; distribution is unlimited.**

**INFORMATION DIRECTORATE  
AIR FORCE RESEARCH LABORATORY  
AIR FORCE MATERIEL COMMAND  
WRIGHT-PATTERSON AIR FORCE BASE, OH 45433-7334**

## NOTICE

USING GOVERNMENT DRAWINGS, SPECIFICATIONS, OR OTHER DATA INCLUDED IN THIS DOCUMENT FOR ANY PURPOSE OTHER THAN GOVERNMENT PROCUREMENT DOES NOT IN ANY WAY OBLIGATE THE US GOVERNMENT. THE FACT THAT THE GOVERNMENT FORMULATED OR SUPPLIED THE DRAWINGS, SPECIFICATIONS, OR OTHER DATA DOES NOT LICENSE THE HOLDER OR ANY OTHER PERSON OR CORPORATION; OR CONVEY ANY RIGHTS OR PERMISSION TO MANUFACTURE, USE, OR SELL ANY PATENTED INVENTION THAT MAY RELATE TO THEM.

THIS REPORT IS RELEASABLE TO THE NATIONAL TECHNICAL INFORMATION SERVICE (NTIS). AT NTIS, IT WILL BE AVAILABLE TO THE GENERAL PUBLIC, INCLUDING FOREIGN NATIONS.

THIS TECHNICAL REPORT HAS BEEN REVIEWED AND IS APPROVED FOR PUBLICATION.

  
DARRELL BARKER  
Project Engineer  
Embedded Info Sys Engineering Branch  
Information Technology Division

  
JAMES S. WILLIAMSON, Chief  
Embedded Info Sys Engineering Branch  
Information Technology Division  
Information Directorate

Do not return copies of this report unless contractual obligations or notice on a specific document requires its return.

# REPORT DOCUMENTATION PAGE

*Form Approved  
OMB No. 0704-0188*

The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports (0704-0188), 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to any penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. **PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS.**

|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                    |                                     |                                                                                     |                                  |                                                                    |  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|-------------------------------------|-------------------------------------------------------------------------------------|----------------------------------|--------------------------------------------------------------------|--|
| <b>1. REPORT DATE (DD-MM-YY)</b><br>August 2002                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                    |                                     | <b>2. REPORT TYPE</b><br>Final                                                      |                                  | <b>3. DATES COVERED (From - To)</b><br>03/22/1995 – 04/30/1999     |  |
| <b>4. TITLE AND SUBTITLE</b><br><br>POWER ESTIMATION AND SYNTHESIS FOR LOW POWER                                                                                                                                                                                                                                                                                                                                                                                                                                                    |                                    |                                     | <b>5a. CONTRACT NUMBER</b><br>F33615-95-C-1625                                      |                                  |                                                                    |  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                    |                                     | <b>5b. GRANT NUMBER</b>                                                             |                                  |                                                                    |  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                    |                                     | <b>5c. PROGRAM ELEMENT NUMBER</b><br>61101E                                         |                                  |                                                                    |  |
| <b>6. AUTHOR(S)</b><br><br>Dr. Kaushik Roy                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                                    |                                     | <b>5d. PROJECT NUMBER</b><br>C506                                                   |                                  |                                                                    |  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                    |                                     | <b>5e. TASK NUMBER</b><br>02                                                        |                                  |                                                                    |  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                    |                                     | <b>5f. WORK UNIT NUMBER</b><br>03                                                   |                                  |                                                                    |  |
| <b>7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)</b><br><br>Purdue University<br>Department of Electrical and Computer Engineering<br>1285 Electrical Engineering Building<br>West Lafayette, IN 47907-1285                                                                                                                                                                                                                                                                                                                    |                                    |                                     | <b>8. PERFORMING ORGANIZATION REPORT NUMBER</b>                                     |                                  |                                                                    |  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                    |                                     |                                                                                     |                                  |                                                                    |  |
| <b>9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)</b><br><br>Information Directorate<br>Air Force Research Laboratory<br>Air Force Materiel Command<br>Wright-Patterson Air Force Base, OH 45433-7334                                                                                                                                                                                                                                                                                                                      |                                    |                                     | <b>10. SPONSORING/MONITORING AGENCY ACRONYM(S)</b><br>AFRL/IFTA                     |                                  |                                                                    |  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                    |                                     | <b>11. SPONSORING/MONITORING AGENCY REPORT NUMBER(S)</b><br>AFRL-IF-WP-TR-2002-1523 |                                  |                                                                    |  |
| <b>12. DISTRIBUTION/AVAILABILITY STATEMENT</b><br>Approved for public release; distribution is unlimited.                                                                                                                                                                                                                                                                                                                                                                                                                           |                                    |                                     |                                                                                     |                                  |                                                                    |  |
| <b>13. SUPPLEMENTARY NOTES</b>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                    |                                     |                                                                                     |                                  |                                                                    |  |
| <b>14. ABSTRACT</b><br><br>This document is the final report of the Power Estimation and Synthesis for Low Power. It describes the contributions and achievements of this project. The project explored a wide variety of techniques related to the design of low power CMOS electronic circuits. It explored power estimation techniques, synthesis techniques, macro level design techniques, and low power CMOS logic families. A number of computer-aided design algorithms were implemented to support the various techniques. |                                    |                                     |                                                                                     |                                  |                                                                    |  |
| <b>15. SUBJECT TERMS</b><br><br>low power electronics, circuit synthesis, power estimation, CMOS logic families, electronic computer-aided design                                                                                                                                                                                                                                                                                                                                                                                   |                                    |                                     |                                                                                     |                                  |                                                                    |  |
| <b>16. SECURITY CLASSIFICATION OF:</b>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                    |                                     | <b>17. LIMITATION OF ABSTRACT:</b><br>SAR                                           | <b>18. NUMBER OF PAGES</b><br>30 | <b>19a. NAME OF RESPONSIBLE PERSON</b> (Monitor)<br>Darrell Barker |  |
| <b>a. REPORT</b><br>Unclassified                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | <b>b. ABSTRACT</b><br>Unclassified | <b>c. THIS PAGE</b><br>Unclassified | <b>19b. TELEPHONE NUMBER</b> (Include Area Code)<br>(937) 255-6548 x3605            |                                  |                                                                    |  |

# Table of Contents

| Section                                                             | Page |
|---------------------------------------------------------------------|------|
| List of Figures .....                                               | iv   |
| 1. Introduction .....                                               | 1    |
| 2. Average Dynamic Power Estimation .....                           | 2    |
| 2.1 Techniques to Estimate Power in Combinational Circuits .....    | 2    |
| 2.2 Techniques to Estimate Power in Sequential Circuit .....        | 2    |
| 2.3 Estimation of Bounds on Average Power .....                     | 3    |
| 2.4 Power Macromodeling Technique .....                             | 3    |
| 3. Leakage Power Estimation and Control .....                       | 4    |
| 3.1 Leakage Current Model .....                                     | 4    |
| 3.2 Techniques to Estimate Leakage Power .....                      | 4    |
| 3.3 Leakage Control Techniques .....                                | 4    |
| 4. Peak Power Estimation.....                                       | 5    |
| 5. Synthesis of Low-Power Logic .....                               | 6    |
| 6. High-Performance Low-Power Complex CMOS Logic Styles .....       | 7    |
| 6.1 Differential Current Switch Logic (DCSL) .....                  | 7    |
| 6.2 Quantifying Noise Immunity of Gates .....                       | 7    |
| 6.3 Stacking Effects .....                                          | 8    |
| 6.4 Ratioed Static CMOS .....                                       | 8    |
| 6.5 Power Reduction in Long Lines Using Split Gate Structures ..... | 8    |
| 6.6 Multiple Supply Design .....                                    | 8    |
| 7. Dual-V <sub>th</sub> Circuit Design.....                         | 9    |
| 7.1 Dual-V <sub>th</sub> Circuit Schemes .....                      | 9    |
| 7.2 Delay and Power Estimation Methods .....                        | 9    |
| 7.3 Algorithms for dual-V <sub>th</sub> .....                       | 10   |
| 8. Low Power BIST.....                                              | 11   |
| 8.1 POWERTEST .....                                                 | 11   |
| 8.2 MACROTEST .....                                                 | 11   |
| 9. Low-Power VLSI Signal Processing .....                           | 12   |
| 9.1 Low-Complexity Multiplierless Filters.....                      | 12   |
| Bibliography .....                                                  | 17   |

## List of Figures

| <u>Figure</u>                                                                                                         | <u>Page</u> |
|-----------------------------------------------------------------------------------------------------------------------|-------------|
| 1. Graph Representation of an Example Filter With $M = 4$ .....                                                       | 13          |
| 2. Graph Representation of an Example Filter With $M = 8$ .....                                                       | 14          |
| 3. Average Number of Adders per First-Order Differential Coefficient For<br>Sign-Magnitude Number Representation..... | 15          |
| 4. Average Number of Adders per First-Order Differential Coefficient For<br>SPT Number.....                           | 16          |

## **1. Introduction**

Lowest possible power dissipation is a requirement for many Department of Defense (DoD) and commercial circuits. For DoD applications, low power is demanded by man-portable, missiles, munitions, and satellite applications. For commercial applications, low power is demanded by portable electronic computer, entertainment, medical, and communication systems. The Power Estimation and Synthesis For Low Power project investigated a wide variety of circuit design techniques, power estimation algorithms, design algorithms, and advanced logic circuit types that may be used to implement low power electronic systems.

## 2. Average Dynamic Power Estimation

In this research we developed techniques (and corresponding software tools) to estimate power dissipation in digital CMOS circuits. The main contributions and achievements are listed below. Software tools were developed to estimate power at different levels of design abstraction.

- Efficient and accurate techniques to estimate power in combinational circuits
- Efficient and accurate techniques to estimate power in sequential circuits
- Estimation of bounds on power based on power sensitivity
- Developed and implemented power macromodeling technique

### *2.1 Techniques to Estimate Power in Combinational Circuits*

Symbolic and statistical techniques have been developed to accurately estimate power dissipation considering simultaneous switching, temporal, and spatial signal correlations.

The basic idea of the symbolic method is to express the signal probability (probability of a signal being logic ONE) and signal activity (probability of signal switching) of each internal node in terms of the probability and activity of primary inputs so that spatial correlation between internal nodes can be handled. Results show that power dissipation determined by our technique is on the average within 2% of logic simulation results. Ignoring simultaneous switching can introduce an error on the order of over 21%.

The basic idea of Monte-Carlo based statistical method to estimate power dissipation is to simulate a circuit with random patterns applied to primary inputs. Such random patterns conform to the given probabilities and activities of primary inputs. The number of simulations are determined by user-specified parameters, such as confidence levels and errors that can be tolerated. The statistical technique can handle different delay models for logic gates so as to include spurious transitions in its analysis. Due to presence of different delay paths converging to logic gates, spurious transitions can occur. This in turn increases power dissipation. Results indicate that spurious transition can account for more than 50% of power dissipation for some benchmark circuits.

### *2.2 Techniques to Estimate Power in Sequential Circuit*

Probabilistic and statistical techniques have been implemented to estimate power dissipation in sequential circuits. Due to the feedback of inputs from the next state, the estimation techniques for combinational and sequential circuits are quite different.

Techniques to estimate signal probability and activity works as follows. Given the STG (state transition graph) of a sequential circuit or an FSM (finite state machine), we build an Extended State Transition Graph (ESTG) and calculate the probability of a state of the ESTG. The signal activities are then estimated from the ESTG.

The exact method may require solving for a linear system of equations of size  $2^{N+M}$  where  $N$  is the number of primary inputs and  $M$  is the number of flip-flops. For large circuits with large number of primary inputs and flip-flops, the exact method is not computationally feasible. Therefore, we propose an

approximate method which takes temporal correlations of primary inputs into account. We unroll the circuit  $k$  times to calculate the probabilities and activities of internal nodes. Results indicate that this technique can have an accuracy of 90% while being several orders of magnitude faster than logic simulation.

Statistical techniques for sequential circuits consider Near-Closed (NC) sets of states. A set of states is called "near-closed" if the probability of being in that set of states is high if the starting state is in the set of "near closed" set. Techniques to determine warm-up period and stopping criteria for Monte Carlo based statistical simulations have been determined under the presence of NC sets. The computation time of state probability can be reduced by 50% (compared to standard Monte Carlo technique) by the proposed method. The relative error of the estimated individual node activity by the Monte Carlo based technique with a warm-up period is within 3% of the result obtained by long run logic simulation.

### *2.3 Estimation of Bounds on Average Power*

Power dissipation in CMOS circuits is heavily dependent on the input signal distribution. However, due to uncertainties in specification of the input signal distribution the average power dissipation should be specified between a maximum and a minimum possible value. Due to the complex nature of the problem, it is practically impossible to use traditional power estimation techniques to determine the bounds. Power sensitivity, defined as the change in average power due to changes in the specification of primary inputs, can be used to accurately estimate the maximum and minimum bounds for average power.

Both symbolic and statistical techniques have been developed to estimate power sensitivity as a by-product of average power estimation, thereby leading to efficient implementation. Our results on ISCAS and MCNC benchmark circuits indicate that for some circuits power dissipation can be very sensitive to some primary inputs. A small variation in signal distribution can cause power dissipation to change drastically. Results on minimum and maximum average power show that such bounds can vary widely if the primary input probabilities and activities are not specified accurately.

### *2.4 Power Macromodeling Technique*

In order to shorten design time and to reduce design iterations, we have to estimate the power dissipation at a high level of abstraction to ensure that the strict power requirement of a future design is satisfied. One of the main objectives of this research is to develop a power macromodel for a module so that power dissipation can be obtained under any distribution of primary inputs. When the same module is reused, we can obtain its power simply by using a look-up table. Since the power dissipation of a circuit is strongly dependent on the statistics of primary inputs, the relationship of power versus primary input probabilities and activities is a complicated surface. Once such a surface is constructed, power dissipation under any distribution of primary inputs can be easily obtained.

A straightforward way is to approximate such a power surface using a large number of discrete points. The more points one chooses, the more accurate the result one can obtain. However, more points directly translate to longer CPU time.

Power sensitivity can be used to efficiently develop a power macromodel. The power surface can be approximated by planes which are constructed by a representative point with power sensitivities. Results for power dissipation under any distribution of primary inputs demonstrated the accuracy and efficiency of this technique.

### **3. Leakage Power Estimation and Control**

With the scaling down of supply and transistor threshold voltage, the power dissipation due to sub-threshold leakage can be high. We developed and implemented a software tool to estimate the leakage power in both stand-by and active mode of operation. We developed:

- Leakage current model
- Techniques to estimate leakage power
- Techniques to control leakage current in future generation IC's

Low supply voltage requires the device threshold to be reduced in order to maintain performance. Due to the exponential relationship between leakage current and threshold voltage in the weak inversion region, leakage power can no longer be ignored. We present a technique to accurately estimate leakage power by accurately modeling the leakage current in transistor stacks. The standby leakage current model has been verified by HSPICE. We demonstrate the dependence of leakage power on primary inputs. Based on our analysis we can determine good bounds for leakage power in the standby mode. As a by-product of this analysis, we can also determine the set of input vectors which can put the circuits in the low-power standby mode.

#### *3.1 Leakage Current Model*

The accuracy of leakage power estimation is critically dependent on the standby leakage current model. We have developed a general model of leakage current for transistors connected in a stack. This model considers the general case of transistor stacks of arbitrary height. It takes into account both body effect and DIBL (Drain Induced Barrier Lowering). DIBL (reduction of threshold voltage as VDS increases) is especially significant for sub-micron devices. The leakage of a transistor stack is shown to directly depend on the magnitude of the DIBL effect. The standby leakage current model has been verified by HSPICE.

#### *3.2 Techniques to Estimate Leakage Power*

Considering reverse biasing between gate and source in transistor stacks, DIBL, and the body effbct, the leakage power of a circuit depends on primary input combinations. Hence, the leakage power of a circuit should be specified between a minimum and a maximum possible value.

Genetic algorithm and a deterministic approach have been developed to effectively search for bounds on leakage power. Unlike random search techniques, the above approaches produce considerably tighter bounds for leakage power dissipation in the stand-by mode of operation.

#### *3.3 Leakage Control Techniques*

Since the leakage power of a circuit depends on primary input combinations, the primary inputs corresponding to the minimum leakage power can be applied to the circuit during standby mode to minimize the leakage power, and thereby leading to a reduction in total power consumption. Results for minimum and maximum leakage power indicate that for some circuits leakage power can vary widely with different primary input combinations. Applying the best primary input combination to a circuit during standby mode will significantly reduce the leakage power.

#### **4. Peak Power Estimation**

With the high demand for reliability and performance, accurate estimation of maximum instantaneous power dissipation in CMOS circuits is essential to determine the IR drop on supply lines and optimizing the power and ground routing. Unfortunately, the problem of determining the input patterns to induce maximum current, and hence, the maximum power, is NP-complete. Even for circuits with small number of primary inputs (PI's), it is CPU time intensive to conduct exhaustive search in the input vector space. In this research, we developed an Automatic Test Generation (ATG) based technique to efficiently generate tight lower bounds of the maximum instantaneous power for CMOS circuits with non-zero gate delays. Power dissipation due to spurious transitions has been considered by incorporating static timing analysis into the estimation process. Experiments were performed on ISCAS and MCNC benchmarks. Results show that the ATG-based technique is superior to the traditional simulation-based technique in both speed and performance. For example, for ISCAS89 sequential benchmark circuits having over 10,000 gates, the ATG-based approach is on an average 80% better and 26192% faster.

## **5. Synthesis of Low-Power Logic**

In this research we introduce algebraic procedures for node extraction and factorization that target low power consumption in combinational logic circuits. New cost function is also proposed for the sum-of-products representation of the expressions. This cost function is used to guide the power optimization procedures. The spatial and temporal correlations of signals were taken into account to gain accurate power estimation. The results show that an average of 15% savings in power using logic synthesis with the proposed accurate power estimation technique, compared to area optimized designs.

We have also developed a transistor reordering technique to achieve for low-power under performance constraints. The technique is based on signal activities at the internal nodes of logic gates. Results show that on an average 7% improvement in power can be achieved with no or minimal area increase. Hence, this technique virtually comes for free.

## 6. High-Performance Low-Power Complex CMOS Logic Styles

In this research we consider:

- Logic styles which concentrate only on the extreme performance end and handle issues imposed by scaled  $V_{TH}$  and  $V_{DD}$ .
- High performance coupled with low idle power.
- Dynamic complex gate logic family (DCSL).
- Quantifying dynamic noise immunity.
- Monotonic static CMOS structures: performance coupled with low leakage power by exploiting stacked transistors.
- Power reduction on long lines using split gate structures to reduce voltage swings.

Integrated circuits and systems, over the years have gained increasing degrees of complexity, performance and functionality. The core technology which spurs this has been the improvement of processing technology which has scaled device features into the deep sub-micron range. While device transconductance has improved, increased leakage levels due to falling threshold voltages  $V_{TH}$  and lower  $V_{DD}$  to  $V_{TH}$  ratios, has made operation with traditional high-performance dynamic logic styles difficult. The high active power of circuits operating at the highest speeds has led to increased amount of power management being applied. This in turn has led to these circuits spending an increased amount of time in an idle state where leakage power is of importance. Logic styles which simultaneously maintain or improve performance, are tolerant to leakage and feature low standby power are important. Logic styles chosen allow large fan-in. This is in common with most high performance dynamic logic styles e.g. Domino, where a considerable fraction of the performance improvement is due to high fan-in gates allowing shorter logic depths.

### 6.1 Differential Current Switch Logic (DCSL)

DCSL is a dynamic logic style which features improved performance with lowered active power. Topologically it is a differential cascode voltage switch circuit, with additional transistors to automatically lock out inputs once evaluation is completed. Improved performance is achieved by

- Allowing large high fan-in gates with little impact on speed.
- Large fan-in is exploited in reducing logic depths.

Simultaneously lower power is realized by restricting the voltage swing at internal nodes. Advantages of DCSL were quantified on the critical path of a 64 bit adder. Logic depth fell from 6 to 4, with performance improvement of 26% and power improvement of 22%.

### 6.2 Quantifying Noise Immunity of Gates

A desirable requirement of high speed logic gates is their ability to implement simple sequential clocked pipelines - the clocked sequential design style being the dominant design method. Increasing difficulty in generating clocks with low skews has led to locally self clocked circuit styles being employed. Such pulsed

logic circuits spend a very short period of time in their evaluate stage. The evaluate time is an integer number of gate delays (as governed by the delay for forcing the gate to precharge). Traditional static noise margin analysis yields a very distorted picture of the noise immunity of such gates. By considering all capacitively coupled noise sources, we show that a trapezoidal model for injected noise pulses is an adequate method for estimating the immunity of such gates to coupled noise sources. The analysis is particularly relevant in highlighting the importance of monotonic CMOS structures for high speed circuits.

### 6.3 Stacking Effects

As mentioned earlier, we consider the reduction of leakage power to be important for high speed circuits, since they are aggressively power managed. The stacking effect - lower leakage currents in the presence of a stack of MOS transistors - is an effective circuit technique to reduce leakage. Results measured on an 8X8 bit carry save multiplier show that such techniques can give 7x reductions in leakage current over a wide range of temperature.

### 6.4 Ratioed Static CMOS

Ratioed static CMOS, operates conventional static CMOS in a precharge evaluate pulsed mode. A single transition sense is present at the gate outputs and allows the gates to be preferentially skewed to speed up the evaluate transition. By placing series connected MOS devices in the precharge paths and mostly parallel devices in the evaluate we achieve the following:

- Higher speed due to reduced capacitive loads, preferentially skewed switch thresholds, and lower crow-bar currents.
- Lower leakage power since the circuit in its evaluate state switches of series devices in precharge paths leading to a lower leakage. Additionally the noncritical nature of this path allows the use of lower leakage devices, either through longer channel lengths, back biased wells, or elevated  $V_{TH}$ .

### 6.5 Power Reduction in Long Lines Using Split Gate Structures

While ratioed static CMOS allows high speed for cases where the gate capacitance dominates, it does not provide substantial benefits if the interconnect is the dominant load. By reducing the voltage swing on long lines we achieve simultaneous advantages of speed and power improvement. We note that a  $V_{TH}$  drop exists from gate output to internal nodes. Splitting gates to make the internal nodes drive the actual line restricts the voltage swing on the lines to  $V_{DD} - V_{TH}$ . The method is especially useful when the  $V_{DD}$  to  $V_{TH}$  ratio is low.

### 6.6 Multiple Supply Design

Scheduling and allocation techniques have been developed for DSP datapaths which uses multiple supply voltages during scheduling. Under such scheduling strategy, different functional blocks (such as multipliers, adders, etc.) are allowed to run at optimum supply voltages. Voltage converters are required between different functional blocks running different supply voltages. Results show that more than 50% improvement in power dissipation with no degradation in performance.

## 7. Dual- $V_{th}$ Circuit Design

In this research we achieved the following:

- Developed dual threshold circuit design methodology.
- Developed different dual threshold circuit schemes.
- Several dual- $V_{th}$  assignment algorithms have been developed to achieve the best leakage savings under performance constraints.

In CMOS digital circuits, power dissipation consists of dynamic and static components. Since dynamic power is proportional to the square of supply voltage  $V_{dd}$  and static power is proportional to  $V_{dd}$ , lowering supply voltage is the most effective way to reduce power consumption as long as dynamic power is dominant. With the lowering of supply voltage, transistor threshold voltage should also be scaled in order to satisfy the performance requirements. Unfortunately, such scaling can lead to a dramatic increase in leakage power due to the sub-threshold leakage current.

Dual threshold technique can be used to reduce leakage power by assigning a high threshold voltage to some transistors on non-critical paths, while critical paths are assumed to have low-threshold transistors. Therefore, both high performance and low power can be achieved simultaneously. However, due to the complexity of a circuit, not all the transistors on non-critical paths can be assigned a high threshold voltage, otherwise, the critical path may change, thereby increasing the critical path delay.

In order to achieve the best leakage power saving under performance constraints, we present a dual- $V_{th}$  design methodology. Different dual threshold circuit schemes are considered and several dual- $V_{th}$  assignment algorithms are provided. A standby leakage model which has been verified by HSPICE simulations is used to estimate the standby leakage power of a circuit.

### 7.1 Dual- $V_{th}$ Circuit Schemes

Different dual- $V_{th}$  schemes are considered in our analysis:

- Gate level dual- $V_{th}$  circuit (DVT)  
All the transistors within one gate have the same threshold voltage.
- Mixed- $V_{th}$  type 1 (MVT1)  
There is no mixed  $V_{th}$  in p pull-up or pull-down trees.
- Mixed- $V_{th}$  type 2 (MVT2)  
Mixed  $V_{th}$  is allowed anywhere except for the series connected transistors (transistors in a stack).

### 7.2 Delay and Power Estimation Methods

Delay information can be achieved by the following methods:

- (a) Elmore delay model
- (b) Delay look-up table based on HSPICE simulations
- (c) Pathmill simulations

Dynamic power dissipation can be simulated by a Monte Carlo based statistical method, where the switching at internal node is taken into account. An accurate leakage model which has been verified by HSPICE simulations is used for leakage power estimation. The stack effect, short channel effect and body effect are considered. Considering the fact that leakage current depends on input signals, the average leakage power can be evaluated with random patterns applied to primary inputs.

### 7.3 Algorithms for dual- $V_{th}$ assignment

Three algorithms for dual-Vth assignment have been developed and implemented:

- Back tracing algorithm ( $O(n)$ )
- Priority selection algorithm ( $O(n^2)$ )
- Priority-based back tracing algorithm ( $O(n)$ )

Priority selection algorithm shows more leakage savings, but also takes more CPU time. Back tracing algorithm is the fastest one, but the leakage savings are less than the other two algorithms. For priority-based back tracing algorithm, the leakage savings are close to that of priority selection algorithm and the run time is similar to that of back tracing algorithm. The method to reduce leakage power using dual-threshold-voltage transistors has been implemented in C under the Berkeley SIS environment. Results show that there is an optimal high threshold voltage for the best leakage savings and the dual threshold technique is good for leakage power reduction during both standby and active modes. In addition to leakage power saving, the dynamic power is also reduced due to the reduction of internal node voltage swing for high threshold gates. The effectiveness of dual- $V_{th}$  design technique depends on the circuit structure. For some ISCAS benchmark circuits, the leakage power can be reduced by more than 80%.

## 8. Low Power BIST

The salient features of this research are given below:

### 8.1 *POWERTEST*

Due to the increasing use of portable computing and wireless communications systems, power consumption is of major concern in today's VLSI circuits. With that in mind we present a low power weighted random pattern testing technique for Built-In-Self-Test (BIST) applications. Power consumption during BIST operation can be minimized while achieving high fault coverage. Simple measures of observability and controllability of circuit nodes are proposed based on primary input signal probability (probability that a signal is logic ONE). Such measures help determine the testability of a circuit. We developed a tool, **POWERTEST**, which uses a genetic algorithm based search to determine optimal probability sets (signal probabilities or input signal distribution) at primary inputs to trade-off test time versus power dissipation and fault coverage. The inputs conforming to the primary input probability-activity sets can be generated using cellular automata or LFSR (Linear Feedback Shift Register). We observed that a single input distribution (or weights) may not be sufficient for some random-pattern resistant circuits, while multiple distributions consume larger area. As a trade-off, two distributions have been used in our analysis. Results on ISCAS benchmark circuits show that power reduction of up to **94.86%** and energy reduction of up to **99.93%** can be achieved (compared to equi-probable random-pattern testing) while achieving high fault coverage.

### 8.2 *MACROTEST*

For large circuits, GA-based algorithm is computationally expensive. We developed an alternative tool, **MACROTEST**, which uses a macromodel based search to determine optimal primary input probability and activity (probability of switching) set (signal probabilities or input signal distribution) to maximize fault coverage with low energy consumption and can trade-off test time versus energy dissipation and fault coverage. The inputs conforming to the primary input probability and activity set can be generated using cellular automata or LFSR (Linear Feedback Shift Register). Results on ISCAS benchmark circuits show that energy reduction of up to **98.25%** can be achieved (compared to equi-probable random-pattern testing) while achieving high fault coverage. We also developed a cellular automata based test generator to achieve low-power BIST.

## 9. Low-Power VLSI Signal Processing

We realized that large improvements in power dissipation is possible at high levels of design abstraction. This research focusses on design techniques to reduce power dissipation by reducing computational complexity and high level synthesis.

### 9.1 Low-Complexity Multiplierless Filters

We present a computation reduction technique which can be used to obtain multiplierless implementations of both finite and infinite impulse *response* (FIR) and (IIR) digital filters, respectively. The main idea is to remove computational redundancy by reordering computation. Various approaches are investigated which consider normal, differential and hybrid arrangements for storing coefficients. The frequency response of the filter is unaltered. It is shown that the reordering problem can be formulated using a graph in which vertices represent the coefficients and edges represent resources required in a computation involving the coefficient order specified by vertices. We present various approaches for exploiting computational redundancy reduction and the overheads involved. A major advantage of this methodology is that it is independent of the number representation scheme and the word-length of coefficients. Simple polynomial run time algorithms are presented and their power and potential is demonstrated by presenting results for large filters (*lengths* up to  $> 300$ ) which show that less than 2 add operations per coefficient are required. Hence, these filters can be used in low-power and/or high-speed applications where data can be processed in blocks.

The main idea of our work is to find an ordering of coefficients which minimizes the number of adders required in the filter implementation using a graph theoretic approach. We employ a differential coefficient scheme which can be implemented for any coefficient ordering in either FIR or IIR filters. Hence, the work proposed can be viewed as a special case of the more general frame-work presented in this work. Using the proposed differential *coefficients multiplierless* implementation (DCMI) scheme, one can obtain multiplierless implementations which yields less than 2 adders per coefficient as demonstrated later. The main contributions of this work are summarized below:

- The frequency response of the given filter is not altered.
- DCMI approach is independent of the number representation scheme used and our choice of the number of bits to represent the coefficients.
- Solution of DCMI problem are solutions to well-known graph theoretic problems. Efficient polynomial time algorithms can be employed to obtain "good" solutions.
- The frame-work presented in this work can account for more general problems which consider memory overheads (by modifying edge costs), or, when given fixed resources (by solving a graph partitioning problem).

In summary, our approach can be used to obtain a unified frame-work in which low complexity and low power block FIR filters can be obtained without compromising frequency response characteristics of a given optimal filter. Hence, it offers a very powerful compliment to the existing methodologies for reducing filter complexity in the domain of high-performance block filtering.

One may note that there are two ways to obtain reduction in power dissipation using this approach. First, we get a direct reduction in power dissipation due to removal of redundant computation. This advantage appears in the form of reduced *switching activity* because of relatively fewer computational

operations. Second, we can obtain multiplierless implementations, which are of immense interest in high-speed signal processing applications, and, which can also be used to further reduce power levels by employing *voltage scaling*.

Consider a *linear time-invariant* (LTI) FIR filter of length M described by an input-output relationship of the form

$$y(n) = \sum_{i=0}^{M-1} c_i x(n - i) = \sum_{i=0}^{M-1} p_i^{(n)} \quad (1)$$

In this context,  $c_i$  represents the *i*th coefficient and  $x(n - i)$  denotes the data sample at time instant  $n - i$ .  $p_i^{(n)}$  represents the partial product  $c_i x(n - i)$  for  $i = 0, M - 1$  computed at time instant  $n$ . Figure 1 shows a graph  $G = \{V, E\}$  representation of a 4-tap ( $M = 4$ ) FIR filter in which each vertex represents a coefficient and the edge  $E_{i,j}$ ,  $i, j = 0, 1, 2, 3$  represents the resources required to multiply a data sample with the preceding vertex (i.e. coefficient  $c_i$ ). If an array multiplier is used to compute the products,  $E_{i,j}$  represents the number of rows required of adders required to implement the multiplier and given as the number of 1-bits in  $c_i$ .  $M = 4$  parallel multipliers are required to obtain a parallel implementation of the M-tap filter.  $E_{i,j}$  depends only on the number representation scheme and the type of multiplier employed. Further, the  $G$  is undirected and  $E_{i,j} = E_{j,i}$  for all  $i, i = 0, 1, \dots, M - 1$ .



Figure 1: Graph representation of an example filter with  $M = 4$



Figure 2: Graph representation of an example filter with  $M = 8$ .

With the above interpretation of the graph, the output in equation 1 can be calculated by a tour along the graph at time instant  $n$ . Figure 2(a) shows one such tour in  $G$  which consists of edges  $E_{i,(i+l)modM}$ ,  $i = 0, 1, \dots, M - 1$  for an  $M = 8$  tap filter. The coefficients are applied such that  $c_{j+l}$  follows  $c_j$ ,  $j = 0, \dots, M - 2$ . The appropriate data sample with the corresponding coefficient are shown next to the edges. The total resources required to compute the output given by equation 1 at time instant  $n$  is given by the sum of resources required to compute the partial ( $p(n)$ 's) along each edge in the tour. At the next time instant,  $n + 1$ , each data sample  $x(i)$ ,  $i = n, n - 1, \dots, n - M + 1$  in the graph is replaced by  $x(i + 1)$ . The outputs of the filter at time instants  $n - 1$  and  $n$  are given as

$$y(n - 1) = c_0 x(n - 1) + c_1 x(n - 2) + \dots + c_{M-1} x(n - M) \quad (2)$$

$$= P_0^{(n-1)} + P_1^{(n-1)} + \dots + P_{M-1}^{(n-1)}$$

$$y(n) = c_0 x(n) + c_1 x(n - 1) + \dots + c_{M-1} x(n - M + 1) \quad (3)$$

$$= p_0^{(n)} + p_1^{(n)} + \dots + p_{M-1}^{(n)}$$

Consider the tour in figure 2(b). Suppose that this order yields differential coefficients which are simpler to implement (e.g. they may be powers-of-two), and hence, the implementation so obtained has lower complexity. Note that in this example, the ordering is given by  $c_0, c_4, c_5, c_1, c_2, c_6, c_7, c_3$ . The corresponding data sample  $x(n - i)$  migrates from the edge  $E_{i,j}$  to  $E_{i,k}$ , such that if  $T'$  is the new tour,  $E_{i,k} \in T'$ ,  $k \neq j$ . This is shown in figure 2 which shows that  $x(n - i)$  now refers to the  $p(n)$  new edge originating at  $c_i$ . Next, we can calculate  $P_i^{(n)}$  for  $i = 0, 1, \dots, M - 1$ . For simplicity in notation, let  $\mathbf{K} = \{k_0, k_1, \dots, k_{M-1}\}$  be the set representing the indices of coefficients in the new ordering. Hence, for the example in figure 2(b),  $\mathbf{K} = \{0, 4, 5, 1, 2, 6, 7, 3\}$ . Then, the new differential coefficients for the order sequence in  $\mathbf{K}$  are given by  $c_i = c_{ki+1} - C_{kb}$ ,  $i = 0, 1, \dots, M - 1$ .

The implementation which constructs a tour with least number of resources (total number of adders) can be obtained by computing the Hamiltonian path with smallest weight in  $G$ . The Hamiltonian cycle can be solved by employing one of the known methods of solving the *traveling salesman problem* (TSP). Hence, the DCMI approach computes the *Hamiltonian path* in  $G$ .

Figures 3 and 4 show a relative comparison of the average number of adders per differential coefficient obtained using the first-order DCMI solutions for *sign-magnitude* (SM) and *signed-power-of-two* (SPT) number representations, respectively. We compare the number of adders per differential coefficient for 8, 16 and 24 bit coefficients. The example filters considered were 28-tap PM, 41-tap LS, 119-tap PM, 172-tap LS, 131-tap PM, 170-tap LS, 151-tap PM, 217-tap LS, respectively. These results were obtained using the greedy strategy for first-order DCMI. We note that SPT implementations require less adders than SM implementations for all word-lengths. We also observe a linear relationship between the average number of adders per differential coefficient with the word-length. Further, the average number of adders per differential coefficient reduces, in general, as the length of the filter increases. We note that traditional approaches of finding multiplierless implementations for word-lengths  $> 16$  would take enormous computational effort and may not yield good solutions. In contrast, our technique takes polynomial time, independent of the word-length and the number representation scheme, and can be used to obtain good DCMI solutions for large filters within a few minutes of CPU time.



Figure 3: Average Number of Adders per First-Order Differential Coefficient For Sign-Magnitude Number Representation.



Figure 4: Average Number of Adders per First-Order Differential Coefficient For SPT Number Representation.

## Bibliography

- [1] S. Prasad and K. Roy, "Transistor Reordering for Power Minimization under Delay Constraint," *ACM Transactions on Design Automation of Electronic Systems*, Vol. 1, No. 2, April 1996, pp. 280-300.
- [2] D. Somasekhar and K. Roy, "Differential Current Switch Logic: A Low-Power DCVS Logic Family," *IEEE Journal of Solid-State Circuits*, July 1996, pp. 981-991.
- [3] T.-L. Chou and K. Roy, "Accurate Estimation of Power Dissipation in CMOS Sequential Circuits," *IEEE Transactions on VLSI Systems*, September 1996, pp. 369-380.
- [4] Y. Ye and K. Roy, "Energy Recovery Circuits Using Reversible and Partially Reversible Logic," *IEEE Transactions on Circuits and Systems I. Fundamental Theory and Applications*, September 1996, pp. 769-778.
- [5] T.-L. Chou and K. Roy, "Estimation of Activity for Static and Domino CMOS Circuits Considering Signal Correlations and Simultaneous Switching," *IEEE Transactions on ComputerAided Design of Integrated Circuits*, October 1996, pp. 1257-1265.
- [6] N. Sankarayya, K. Roy, and D. Bhattacharya, "Algorithms for Low Power High Speed FIR Filter Realization Using Differential Coefficients," *IEEE Transactions on CircuitS and SY-SteMS: Analog and Digital Signal Processing*, June 1997, pp. 488-497.
- [7] M. Johnson and K. Roy, "Datapath Scheduling with Multiple Supply Voltages and Voltage Converters," *ACM Transactions on Design Automation of Electronic Systems*, July 1997.
- [8] T.-L. Chou and K. Roy, "Statistical Estimation of Digital Circuit Activity Considering Uncertainty of Gate Delays," *IEICE (Japan) Transactions on Fundamentals of Electronics, Communications and Computer Sciences*, special issue on VLSI Design and CAD Algorithms, October 1997, pp. 1915-1923.
- [9] Y. Ye and K. Roy, "An XOR-Based Decomposition Diagram and Its Application in Synthesis of AND-XOR Network," *IEICE (Japan) Transactions on Fundamentals of Electronics, Communications and Computer Sciences*, special issue on VLSI Design and CAD Algorithms, October 1997, pp. 1742-1748.
- [10] S. Nag and K. Roy, "Performance and Wireability Driven Layout for Row-Based FPGAS," *Journal of VLSI Design*, accepted for publication.
- [11] K. Roy, "Power Dissipation Driven FPGA Place and Route Under Timing Constraints," *IEEE Transactions on Circuits and Systems 1. Fundamental Theory and Applications*, accepted for publication.
- [12] T.-L. Chou and K. Roy, "Power Estimation Under Uncertain Delays," *Integrated ComputerAided Engineering*, Wiley-Intersciece, pp. 107-116, vol. 5, no. 2, 1998.
- [13] C.-Y. Wang and K. Roy, "Maximum Power Estimation for CMOS Circuits Using Deterministic and Statistical Techniques," *IEEE Transactions on VLSI Systems*, March 1998, pp. 134-140.

- [14] Z. Chen, K. Roy, and T.-L. Chou, "Efficient Statistical Approach to Estimate Power Considering Uncertain Properties of Primary Inputs," *IEEE Transactions on VLSI Systems*, pp. 484-492, September 1998.
- [15] D. Somasekhar and K. Roy, "LVDCSL: A High-Performance Low-Power Logic Using High Fanin Gates," *IEEE Transactions on VLSI Systems*, Special issue on low-power design, December 1998, pp. 573-577.
- [16] L. Wei, Z. Chen, M. Johnson, and K. Roy, "Design and Optimization of Dual Threshold Circuits for Low Voltage Low Power Applications," *IEEE Transactions on VLSI Systems*, Special issue on low-power electronics and design, March 1998, pp. 16-24.
- [17] C. Wang and K. Roy, "Control Unit Synthesis Targeting Low-Power Processors," *IEEE Transactions on VLSI Systems*, March 1999, pp. 130-134.
- [18] H. Soeleman, K. Roy, and T.-L. Chou, "Accurate Power Estimation at the Transistor and Gate Level," *IEEE Design and Test of Computers*, to appear.
- [19] Y. Ye and K. Roy, "QSERL: Quasi-Static Energy Recovery Logic," *IEEE Journal of SolidState Circuits*, to appear.
- [20] M. Johnson, D. Somasekhar, and K. Roy, "Deterministic Estimation of Minimum and Maximum Leakage Conditions in CMOS Logic," *IEEE Transactions on Computer-Aided Design of IC's*, to appear.
- [21] C-Y. Wang and K. Roy, "Estimation of Lower Bound on Maximum Power for Sequential Circuits," *IEEE Transactions on Circuits and Systems - I*, to appear.
- [22] T-L. Chou, K. Roy, and S. Prasad, "Estimation of Circuit Activity Considering Signal Correlations and Simultaneous Switching," IEEE International Conference on Computer-Aided Design, November 1994, pp. 300-303.
- [23] T.-L. Chou and K. Roy, "Accurate Estimation of Power Dissipation in CMOS Sequential Circuits," *IEEE International ASIC Conference*, Austin, Texas, September 1995, pp. 285-288.
- [24] C.-Y. Wang and K. Roy, "Control Unit Synthesis Targeting Low-Power Processors," IEEE International Conference on Computer Design, Austin, Texas, October 1995, pp. 454-459.
- [25] T.-L. Chou and K. Roy, "Estimation of Sequential Circuit Activity Considering Spatial and Temporal Correlations," IEEE International Conference on Computer Design, Austin, Texas, October 1995, pp. 577-582.
- [26] Y. Ye and K. Roy, "Low Power Circuit Design using Adiabatic Switching Principle," *IEEE Midwest Symposium on Circuits and Systems*, 1995, pp. 1189-1193.
- [27] D. Somasekhar and K. Roy, "Differential Current Switch Logic: A Low Power DCVS Logic Family," 21st *European Solid-State Circuits Conference*, September 1995, pp. 182-185.
- [28] D. Somasekhar, Y. Ye, and K. Roy, "An Energy Recovery Static RAM Memory Core," *IEEE Symposium on Low Power Electronics*, October 1995, pp. 62-63.

- [29] T.-L. Chou and K. Roy, "Statistical Estimation of Sequential Circuit Activity," *IEEEIACM International Conference on Computer-Aided Design*, November 1995, pp. 34-37.
- [30] C.-Y. Wang and K. Roy, "Maximum Power Estimation for CMOS Circuits Using Deterministic and Statistical Approaches," *IEEE VLSI Design Conference*, January 1996, pp. 364-369.
- [31] C.-Y. Wang, T.-L. Chou, and K. Roy, "Maximum Power Estimation for CMOS Circuits Under Arbitrary Delay Model," *IEEE International Symposium on Circuits and Systems*, May 1996.
- [32] C.-Y. Wang, K. Roy, and T.-L. Chou, "Maximum Power Estimation for Sequential Circuits Using a Test Generation Based Technique," *IEEE Custom Integrated Circuits Conference*, May 1996, pp. 229-232.
- [33] M. Johnson and K. Roy, "Optimal Selection of Supply Voltages and Level Conversions During Datapath Scheduling Under Resource Constraints," *IEEE Intl. Conf. on Computer Design*, October 1996, pp. 72-77.
- [34] N. Sankarayya, K. Roy, and D. Bhattacharya, "Algorithms for Low Power FIR Filter Realization using Differential Coefficients," *IEEE VLSI Design Conference*, January 1997, pp. 174-178.
- [35] P. Patil, T.-L. Chou, K. Roy, and R. Roy, "Low-Power Driven Logic Synthesis Using Accurate Power Estimation Technique," *IEEE VLSI Design Conference*, January 1997, pp. 179-183.
- [36] T.-L. Chou and K. Roy, "Statistical Estimation of Combinational and Sequential CMOS Digital Circuit Activity Considering Uncertainty of Gate Delays," *1997 Asia & South Pacific Design Automation Conference - ASP-DAC*, January 1997, pp. 95-100.
- [37] Y. Ye and K. Roy, "Efficient Synthesis of AND/XOR Networks," *1997 Asia & South Pacific Design Automation Conference - ASP-DAC*, January 1997, pp. 539-544.
- [38] K. Roy and R. Roy, "Low-Power Design: Estimation and Synthesis Techniques," tutorial presentation at the *1997 Asia & South Pacific Design Automation Conference - ASP-DAC*, January 1997.
- [39] Z. Chen, K. Roy, and T.-L. Chou, "Sensitivity of Power Dissipation to Uncertainties in Primary Input Specification," *IEEE Custom Integrated Circuits Conference*, 1997, pp. 487-490.
- [40] Y. Ye and K. Roy, "A Graph-based Synthesis Algorithm for Multi-Level AND/XOR Networks," *ACM/IEEE Design Automation Conference*, 1997, pp. 107-112.
- [41] M. Johnson and K. Roy, "Scheduling and Optimal Voltage Selection for Low Power MultiVoltage DSP Datapaths," *1'EEE International Symposium on Circuits and Systems*, Hongkong, May 1997, invited paper.
- [42] K. Muhammad and K. Roy, "Low Power Digital Filters Based on Constrained Least Squares Solution," *31st Asilomar Conference on Signals, Systems, and Computers*, 1997, invited paper.
- [43] Y. Ye and K. Roy, "Adiabatic and Quasi-CMOS Logic Design," *European Conference on Circuit Theory and Design*, 1997, invited paper.

- [44] K. Muhammed and K. Roy, "On Power Reduction of FIR Digital Filters Using Constrained Least Square Solutions," *IEEE International Conference on Computer Design*, 1997, pp. 196-201.
- [45] C.-Y. Wang and K. Roy, "An ATG-Based Maximum Power Estimation Technique Considering Spurious Transitions," *IEEE International Conference on Computer Design*, 1997, pp. 746-751.
- [46] D. Somasekhar and K. Roy, "LVDCSL: Low Voltage Differential Current Switch Logic, A Robust Low Power DCSL Family," *1997 International Symposium on Low Power Electronics and Design*, pp. 18-23.
- [47] Y. Ye, K. Roy, and G. Stamoulis, "Quasi-Static Energy Recovery Logic and Supply Clock Generation Circuits," *1997 International Symposium on Low Power Electronics and Design*, pp. 96-99.
- [48] R. Roy, K. Roy, and A. Chatterjee, "Stress Testing: A Low-Cost Alternative for Burn-in," *VLSI'97*, Sep. 1997, Brazil.
- [49] Z. Chen and K. Roy, "An Efficient Statistical Method to Estimate Average Power in Sequential Circuits Considering Input Uncertainties," *IEEE International ASIC Conference*, Portland, 1997, pp. 189-193.
- [50] A. Keshavarzi, K. Roy, and C. Hawkins, "Intrinsic IDQ: Origins, Reduction, and Applications in Deep Sub-M Low-Power CMOS IC's," *IEEE International Test Conference*, 1997, pp. 146-155.
- [51] C.-Y. Wang and K. Roy, "COSMOS: A Continuous Optimization Approach for Maximum Power Estimation of CMOS Circuits," *ACM/IEEE International Conference on ComputerAided Design*, Santa Clara, 1997, pp. 52-55.
- [52] Z. Chen, K. Roy, and T.-L. Chou, "Power Sensitivity - A New Method to Estimate Power Considering Uncertain Specifications of Primary Inputs," *A CMIIIEEE International Conference on Computer-Aided Design*, Santa Clara, 1997, pp. 40-44.
- [53] N. Sankarayya, K. Roy, and D. Bhattacharya, "Optimizing Computations for Reducing Energy Dissipation in Realization of High Speed LTI-FIR Systems," *ACM/IEEE International Conference on Computer-Aided Design*, Santa Clara, 1997, pp. 120-125.
- [54] Y. Ye, K. Roy, and R. Drechsler, "On Power Dissipation in AND-XOR Circuits," *Workshop on Reed-Muller Circuits*, Oxford, England, 1997, pp. 75-84.
- [55] L. Wei, Z. Chen, and K. Roy, "Double Gate Dynamic Threshold Voltage (DGDT) SOI MOS-FETs for Low Power High Performance Designs," *IEEE SOI Conference*, 1997, pp. 82-83.
- [56] H. Soeleman, D. Somasekhar, and K. Roy, "IDD Waveform Analysis for Testing of Domino and Low Voltage Static CMOS Circuits," *IEEE Great Lakes Symp. on VLSI*, Feb. 1998.
- [57] Z. Chen, K. Roy, and Y. Ye, "Estimation of Average Switching Power Under Accurate Modeling of Signal Correlations," *IEEE Custom Integrated Circuits Conference*, May 1998.
- [58] C. Hawkins, A. Keshavarzi, and K. Roy, "High Performance CMOS IC Challenges in IDQ

- Testing," *IEEE European Test Workshop*, May 27-29, Barcelona, 1998.
- [59] L. Wei, Z. Chen, and K. Roy, "Design and Optimization of Low Voltage High Performance Dual Threshold CMOS Circuits," *IEEE/ACM Design Automation Conference*, San Francisco, 1998.
  - [60] Z. Chen and K. Roy, "A Novel Power Macromodeling Technique Based on Power Sensitivity," *IEEE/ACM Design Automation Conference*, San Francisco, June 1998.
  - [61] Z. Chen, L. Wei, M. Johnson, and K. Roy, "Estimation of Standby Leakage Power in CMOS Circuits," *Intl. Symp. on Low Power Electronics and Design*, Monterrey, CA, August 1998, pp. 239-244.
  - [62] K. Roy, "Leakage Power Reduction in Low-Voltage CMOS Designs," Invited paper, *5th IEEE International Conference on Electronics, Circuits and Systems*, September 7-10, 1998, pp. 2.167-2.173.
  - [63] L. Wei and K. Roy, "Design and Optimization for Low-Leakage with Multiple Threshold CMOS," Invited paper, *International Workshop on Power and Timing Modeling, Optimization and Simulation*, Lyngby, Denmark, October 7, 1998, pp. 3-7.
  - [64] K. Muhammad, E. Amyeen, K. Roy, and W. Fuchs, "IDD Waveform Analysis for Testing of Digital CMOS Circuits," *4th IEEE International On-Line Testing Workshop*, July 6-8, Capri, Italy, 1998, pp. 66-70.
  - [65] Z. Chen, K. Roy, and E. Chong, "Estimation of Power Sensitivity for Sequential Circuits With Application to Power Macromodeling," *IEEE/ACM International Conference on ComputerAided Design*, Nov. 1998, pp. 468-472.
  - [66] L. Wei, Z. Chen, and K. Roy, "Design and Optimization of Double-Gate Fully-Depleted SOI MOSFETs for Low Voltage Low Power CMOS Circuits," *IEEE International SOI Conference*, 1998, pp. 69-70.
  - [67] K. Muhammad, D. Somasekhar, and K. Roy, "Low Energy Computing for Portable and Wireless Applications," Invited paper, *1998 Norchip Conference*, Lund, Sweden, Nov. 9-10, 1998, pp. 183-190.
  - [68] X. Zhang, K. Roy, and S. Bhawmik, "Low-Power BIST," *IEEE VLSI Conference*, Jan. 1998, pp. 416-422.
  - [69] K. Roy, "Multiple Supply CMOS for DSP Datapaths," *1999 IEEE International Symposium on Circuits and Systems*, special session invited paper, to appear.
  - [70] Y. Ye, K. Roy, and R. Drechsler, "Power Consumption in AND-XOR Based Circuits," *Asia & South Pacific - Design Automation Conference*, to appear in Jan. 1998.
  - [71] M. Lundberg, K. Muhammad, K. Roy, and S. Wilson, "High Level Modeling of Switching Activity With Application to Low-Power DSP System Synthesis," *ICASSP-99*, to appear in March 1999.
  - [72] M. Johnson, D. Somasekhar, and K. Roy, "Leakage Control with Efficient Use of Transistor Stacks in Single Threshold CMOS," *1999 Design Automation Conference*, to appear in June 1999.

- [73] L. Wei, Z. Chen, K. Roy, Y. Ye, and V. De, "Mixed-Vth (MVT) CMOS Circuit Design Methodology for Low Power Applications," *1999 Design Automation Conference*, to appear in June 1999.
- [74] X. Zhang and K. Roy, "Design and Synthesis of Programmable Low Power Weighted Random Pattern Generator," *IEEE International On-Line Testing Workshop*, July 1999.
- [75] H. Soeleman and K. Roy, "Ultra Low Power Digital Sub-Threshold Logic," *International Symp. on Low-Power Electronics and Design*, August 1999.
- [76] A. Keshavarzi, S. Narendra, C. Hawkins, K. Roy, S. Borkar, and V. De, "Optimum Reverse Body Bias for Standby Power Reduction in Logic CMOS IC's," *International Symp. on LowPower Electronics and Design*, August 1999.